Lecture 20: In-Memory OLTP

Background on OLAP and OLTP

It is challenging to design a DBMS to do well at both OLTP and OLAP. It is especially challenging to tune a DBMS deployment to perform well at both. If you think about the mix of long- and short-running queries, as well as the mix of isolaton-level requirements for the two workloads, you can imagine why: long-running queries and short-running queries don’t coexist happily.

Hence conventional wisdom is to buy and/or deploy separate instances for OLTP (“operational DBMSs”) and for OLAP (“data warehouse”). This also acknowledges the fact that different OLTP databases tend to “belong” to different applications/devops groups, whereas the data warehouse tends to span all the data and “belong” to cross-cutting “decision makers” – e.g. management or their “data science” team.

Where’s the MM OLTP Market?

Lots of evidence that the OLTP market has been harder to crack than the OLAP market in the last decade or 2.

  1. I have been told by people involved in the early days of Snowflake that they originally wanted to go after OLTP on Flash, but on analysis there just wasn’t a 10-100x win there over conventional technology. Then they looked at OLAP in the cloud and saw opportunity.
  2. Viktor Leis at TUM and Andy Pavlo at CMU have done research on (in-memory) OLTP. I asked them about this and they said:

    Viktor: “the commercial success of in-memory OLTP has been – let’s say – limited. Almost all the commercial action has been in the analytics space…This is despite lots of innovative research showing that new OLTP designs are orders of magnitude faster than traditional disk-based systems. My personal guess is that this is due to three reasons:

    • Operational OLTP is a difficult and conservative market.
    • DRAM prices per GB stopped decreasing. So it turned out that storing everything in DRAM is expensive and does not scale….
    • The millions of transactions per second that academic in-memory systems can achieve are kind of fantasy numbers. Users don’t like to put all their logic into stored procedures and it’s very hard to get that many transactions into the DBMS through the network.

    Andy: “Enterprises still run most of their txn-heavy workloads on Oracle, DB2, Sybase, MSSQL, and NonStop. Postgres is seeing an uptick in adoption but they’re not running at the scale of these heavies. HFTs are the only people running hardcore +1m txn/sec workloads but they almost always roll their own systems.

    Real-world stored-procedures are super rare.”

Probably the most widely-deployed in-memory OLTP system is Hekaton, which is built-in to MS SQL Server. I am not aware of how often it gets “used”, but I’ve asked some friends.

SILO: an in-memory OLTP design

Setup

Some keys to high-performance

Techniques

Context: H-Store and Masstree

Main contributions here are the epoch-based OCC protocol.

See paper for details of: